## Lecture 4

## **Speculative Instruction Execution**

- Dynamically scheduled pipelines (Ch. 3.4)
- → ✓ Speculative execution (3.4.2)
- → ✓ Dynamic branch prediction (3.4.3)

### **Putting it Together:**

- → ✓ Tomasulo + Branch prediction + Speculation (3.4.4)
- → ✓ Dynamic memory disambiguation (3.4.5)
- → ✓ Register renaming techniques (3.4.6)

## **Speculative Execution**

## **Non-speculative Execution**



## **Speculative Execution**



## **Speculative Execution**



## **Dynamic Branch Prediction**



## **Branch Target Buffer (BTB)**



### **Two Execution Modes**



Committed (non-speculative)

#### **Question:**

What is the difference between a **completed** and a **committed** instruction?



## Register Alias Table (RAT)

| Register<br>R0<br>R1<br>R2           | Tag<br><br><br>ROB entry      | Status Committed Committed Completed          |
|--------------------------------------|-------------------------------|-----------------------------------------------|
| <br>R31<br>F0<br>F1<br>F2<br><br>F31 | ROB entry ROB entry ROB entry | Completed Pending Pending Committed Committed |

## Register Alias Table (RAT)

#### **Question:**

Where can the values of registers R0, R1 and R2 be found given the content of the RAT?

#### **Answer:**

Registers R0 and R1 are committed so their values are found in the registerfile. Register R2 is completed, but not committed so the value is in ROB entry 0

| Register | Tag    | Status    |
|----------|--------|-----------|
| R0       |        | Committed |
| R1       |        | Committed |
| R2       | ROB 0  | Completed |
|          |        |           |
| R31      | ROB 1  | Completed |
| F0       | ROB 10 | Pending   |
| F1       | ROB 11 | Pending   |
| F2       |        | Committed |
|          |        |           |
| F31      |        | Committed |

| Entry | Instruction    | Dest. reg | Value  | Complete |
|-------|----------------|-----------|--------|----------|
| 15    | BNEZ R3,Loop   |           |        | YES      |
| 16    | L.S F0,0(R1)   | F0        | 1.2E-4 | YES      |
| 17    | L.S F1,0(R2)   | F1        | 5.8E-3 | YES      |
| 18    | ADD.S F2,F1,F0 | F2        | 2.0E-2 | YES      |
| 19    | S.S F2,0(R1)   |           |        | NO       |
| 20    | ADDI R1,R1,#4  | R1        | N/A    | NO       |
| 21    | ADDI R2,R2,#4  | R2        | N/A    | NO       |
| 22    | SUBI R3,R3,#1  | R3        | N/A    | NO       |
| 23    | BNEZ R3,Loop   |           |        |          |

#### **Question:**

Where can the values of F2 and R3 be found?

#### **Answer:**

F2 is in the reorder buffer but R3 is not available as an instruction is speculatively producing its value. This can be seen as the status is not completed.

| Entry | Instruction    | Dest. reg | Value  | Complete |
|-------|----------------|-----------|--------|----------|
| 15    | BNEZ R3,Loop   |           |        | YES      |
| 16    | L.S F0,0(R1)   | FO        | 1.2E-4 | YES      |
| 17    | L.S F1,0(R2)   | F1        | 5.8E-3 | YES      |
| 18    | ADD.S F2,F1,F0 | F2        | 2.0E-2 | YES      |
| 19    | S.S F2,0(R1)   |           |        | NO       |
| 20    | ADDI R1,R1,#4  | R1        | N/A    | NO       |
| 21    | ADDI R2,R2,#4  | R2        | N/A    | NO       |
| 22    | SUBI R3,R3,#1  | R3        | N/A    | NO       |
| 23    | BNEZ R3, Loop  |           |        |          |

| TOP →    | Entry 15 16 17 18 19 20 21 22 23 | Instruction  BNEZ R3, Loop  L.S F0, 0 (R1)  L.S F1, 0 (R2)  ADD.S F2, F1, F0  S.S F2, 0 (R1)  ADDI R1, R1, #4  ADDI R2, R2, #4  SUBI R3, R3, #1  BNEZ R3, Loop | Dest. reg F0 F1 F2 R1 R2 R3 | Value 1.2E-4 5.8E-3 2.0E-2 N/A N/A N/A | Complete YES YES YES YES NO NO NO NO NO NO | Committed Completed Completed Completed Pending Pending Pending Pending Pending Pending |
|----------|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|----------------------------------------|--------------------------------------------|-----------------------------------------------------------------------------------------|
| DOTTON — | ∠3                               | RNEZ K3, Loop                                                                                                                                                  |                             |                                        |                                            | Pending                                                                                 |

Tag = Reorder buffer entry

|          | Entry    | Instruction    | Dest. reg | Value  | Complete |           |
|----------|----------|----------------|-----------|--------|----------|-----------|
| TOP 🗪    | 16       | L.S F0,0(R1)   | F0        | 1.2E-4 | YES      | Committed |
|          | 17       | L.S F1,0(R2)   | F1        | 5.8E-3 | YES      | Completed |
|          | 18       | ADD.S F2,F1,F0 | F2        | 2.0E-2 | YES      | Completed |
|          | 19       | S.S F2,0(R1)   |           |        | YES      | Completed |
|          | 20       | ADDI R1,R1,#4  | R1        | N/A    | NO       | Pending   |
|          | 21       | ADDI R2,R2,#4  | R2        | N/A    | NO       | Pending   |
|          | 22       | SUBI R3,R3,#1  | R3        | N/A    | NO       | Pending   |
|          | 23       | BNEZ R3,Loop   |           | N/A    | NO       | Pending   |
| BOTTOM - | 24       | L.S F0,0(R1)   | F0        | N/A    | NO       | Pending   |
| BOTTON — | <u> </u> | ⊔.5 f∪,U(RI)   | ГU        | N/A    | NO       | rending   |

# **Speculative Execution**



Here, the branch instruction is speculatively executed

L.S F1,0(R2)
ADD.S F2,F1,F0
S.S F2,0(R1)
ADDI R1,R1,#4
ADDI R2,R2,#4
SUBI R3,R3,#1
BNEZ R3,Loop



ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop



SUBI R3,R3,#1 BNEZ R3,Loop



Here, the branch instruction commits

L.S F1,0(R2)

ADD.S F2,F1,F0

S.S F2,0(R1)

ADDI R1,R1,#4

ADDI R2,R2,#4

SUBI R3,R3,#1

BNEZ R3,Loop





L.S F1,0(R2)
ADD.S F2,F1,F0
S.S F2,0(R1)
ADDI R1,R1,#4
ADDI R2,R2,#4
SUBI R3,R3,#1
BNEZ R3,Loop







Mispredicted Branch

L.S F0,0(R1)
L.S F1,0(R2)
ADD.S F2,F1,F0
S.S F2,0(R1)
ADDI R1,R1,#4
ADDI R2,R2,#4
SUBI R3,R3,#1
BNEZ R3,Loop

#### **Question:**

Consider the program below and determine how many cycles it takes to execute two iterations on the speculative pipeline assuming that the branch is correctly predicted.

Loop L.S F0,0(R1)

ADD.S F2,F1,F0

S.S F2,0(R1)

ADDI R1,R1,#4

SUBI R3,R3,#1

BNEZ R3,Loop

**Answer:** 18 cycles

Loop I1:L.S F0,0(R1)

12:ADD.S F2,F1,F0

I3:S.S F2,0(R1)

I4:ADDI R1,R1,#4

I5:SUBI R3,R3,#1

16:BNEZ R3,Loop

|      | C1 | C2 | C3 | C4 | C5 | C6 | <b>C</b> 7 | C8 | <b>C</b> 9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | C23 | C24 |
|------|----|----|----|----|----|----|------------|----|------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 11,1 | IF | ID | IS | EX | WB |    |            |    |            |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
| 12,1 |    | IF | ID | IS | EX | EX | EX         | EX | EX         | WB  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
| 13,1 |    |    | IF | ID | IS | IS | IS         | IS | IS         | EX  | WB  |     |     |     |     |     |     |     |     |     |     |     |     |     |
| 14,1 |    |    |    | IF | ID | IS | EX         | WB |            |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
| 15,1 |    |    |    |    | IF | ID | IS         | EX | WB         |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
| 16,1 |    |    |    |    |    | IF | ID         | IS | EX         | WB  | WB  | WB  |     |     |     |     |     |     |     |     |     |     |     |     |
| 11,2 |    |    |    |    |    |    | IF         | ID | IS         | EX  | WB  | WB  | WB  |     |     |     |     |     |     |     |     |     |     |     |
| 12,2 |    |    |    |    |    |    |            | IF | ID         | IS  | EX  | EX  | EX  | EX  | EX  | WB  |     |     |     |     |     |     |     |     |
| 13,2 |    |    |    |    |    |    |            |    | IF         | ID  | IS  | IS  | IS  | IS  | IS  | EX  | WB  |     |     |     |     |     |     |     |
| 14,2 |    |    |    |    |    |    |            |    |            | IF  | ID  | IS  | EX  | WB  |     |     |     |     |     |     |     |     |     |     |
| 15,2 |    |    |    |    |    |    |            |    |            |     | IF  | ID  | IS  | EX  | WB  |     |     |     |     |     |     |     |     |     |
| 16,2 |    |    |    |    |    |    |            |    |            |     |     | IF  | ID  | IS  | EX  | WB  | WB  | WB  |     |     |     |     |     |     |

#### ROB

#### RAT

| Register | Tag | Status |
|----------|-----|--------|
| R1       | "   |        |
| R2       |     |        |
| R3       |     |        |
| F0       |     |        |
| F1       |     |        |
| F2       |     |        |
| F3       |     |        |
| F4       |     |        |
|          | J   |        |

| Entry    | Instruction    | Dest. reg | Value  | Complete |
|----------|----------------|-----------|--------|----------|
| 16       | L.S F0,0(R1)   | F0        | 1.2E-4 | YES      |
| 17       | L.S F1,0(R2)   | F1        | 5.8E-3 | YES      |
| 18       | ADD.S F2,F1,F0 | F2        |        | NO       |
| 19       | ADD.S F1,F3,F4 | F1        |        | NO       |
| 20       | SD F2,0(R1)    | F2        | N/A    | NO       |
| 21       | ADDI R1,R1,#4  | R1        | N/A    | NO       |
| 22       | ADDI R2,R2,#4  | R2        | N/A    | NO       |
| 23       | SUBI R3,R3,#1  | R3        | N/A    | NO       |
| 24       | BNEZ R3,Loop   | F0        | N/A    | NO       |
| 25       | L.S F0,0(R1)   |           | N/A    | NO       |
|          |                |           |        |          |
|          |                |           |        |          |
| <u> </u> | •              |           |        |          |

#### **Register Renaming: Structures**



- Frontend RAT points to most recent value. Retirement RAT points to most recent retired value (could be the same)
- More speculative values of the same architectural register may be in the register file
- "Updating register" now means "Updating the pointer in the retirement RAT"

#### Register Fetch After Issue



### **Memory disambiguation**

Loads and stores must execute in program order

Example)

SD.S F0, 0(R2)

LD.S F1, 0(R2)

If Mem[(R2)] = 0, and (F2) = 100 what will be in F1 after both instructions are executed?

SD.S F0, 0(R2)

LD.S F1, 0(R3)

What will be in F1 now? Could it be 100 under any condition?, what condition is that?

#### **Exception handling**

- Note: On a misspeculated branch instruction, all instructions before that branch will be fully executed and all instructions after will be squashed
- This is EXACTLY what is needed to handle PRECISE EXCEPTIONS